Introducing GATA3 as a prominent player in Crohn’s disease

Aim: This study was aimed at gene assessment of Crohn's disease (CD) through protein-protein interaction (PPI) network analysis to find crucial genes. Background: CD is a major subtype of inflammatory bowel diseases (IBD), which affects gastrointestinal tract. PPI network analysis is a suitable tool to clarify a critical gene as a drug target or diagnostic biomarker for these types of diseases. Methods: Gene expression profile GSE126124 of 20 CD patients and 20 healthy controls was obtained from the Gene Expression Omnibus (GEO) database. RNA profile of peripheral blood mononuclear cells (PBMCs) and colon biopsy samples of the studied groups was investigated. Crucial genes were selected and analyzed via the PPI network by Cytoscape software. Gene ontology enrichment for the hubs, bottlenecks, and hub-bottlenecks was performed via CluGO plugin of Cytoscape software. Results: Eighty-one differentially expressed genes (DEGs) among 250 initial DEGs were highlighted as significant by FC>2 and p-value ≤ 0.05, and 69 significant DEGs were used for PPI network construction. The network was characterized by poor connections, so 20 top neighbors were added to form a scale-free network. The main connected component included 39 query DEGs and 20 added first neighbors. Three clusters of biological processes associated with crucial genes were identified and discussed. Conclusion: The results of this study indicated that GATA3 has a key role in CD pathogenesis and could be a possible drug target or diagnostic biomarker for Crohn’s disease.


Introduction
1 Inflammatory bowel diseases (IBD) are chronic relapsing disorders with the ability to affect the entire intestine accompanied by long term morbidity (1)(2)(3). The two main subgroups of IBD are Crohn's disease (CD) and ulcerative colitis (UC) (4,5). CD affects gastrointestinal (GI) tract and causes abdominal pain, fever, and clinical signs of bowel obstruction (5)(6)(7)(8). A combination of genetic and environmental risk factors associated with CD susceptibility and numerous studies have pointed to the role of different gene groups in the pathophysiology of this disease (9)(10)(11)(12). Different genetic studies using molecular biology, high throughput methods, and bioinformatics approaches tried to discover the molecular mechanism and biochemical pathways involved in CD development (13,14). Protein-protein interaction (PPI) network analysis is an approach to evaluate dysregulated genes ORIGINAL ARTICLE (queried genes) in patients compared to healthy individuals through a scale-free network. The topological properties of the created scale-free network contained useful information to discriminate queried dysregulated genes (15,16). Centrality measures such as "Degree centrality (K)" and "Betweenness centrality (BC)" are used in analyzing the role of network elements. Degree refers to the number of the first neighbor of a node (equal to the number of links between the studied nodes with its first neighbors) and BC is a function of the shortest path where the node is involved. Degree and BC are widely used to analyze the studied network (17). Based on the degree value and BC amount, three types of critical nodes can be determined: the hub, bottleneck, and hub-bottleneck nodes. The top nodes based on degree value and BC are known as hub and bottleneck, respectively. Hubbottleneck nodes are regarded as the common hub and bottleneck elements of the network. This study aimed to find a possible new biomarker for Crohn's disease through analyzing and screening related genes. The crucial genes were enriched via the gene ontology method.

Methods
Gene expression profile GSE126124 of a total of 20 CD patients (10 males and 10 females) and 20 healthy controls (10 males and 10 females) were extracted from Gene Expression Omnibus (GEO) database. RNA profile of peripheral blood mononuclear cells (PBMCs) and colon biopsy samples from patients with CD vs. controls was investigated. The age of patients and controls were between 8-18 years old. Gene expression distributions were evaluated through boxplot analysis using GEO2R. The top 250 differentially expressed genes (DEGs) based on p-value amounts were selected for more investigation. Eighty-one DEGs that were characterized by Fold change (FC) > 2 and p-value ≤ 0.05 were considered as significant DEGs. The significant DEGs plus the 20 first neighbors were included in a PPI network by Cytoscape software v 3.7.2 and STRING. The constructed network was analyzed by Network analyzer (an application of Cytoscape) to identify crucial genes. Degree and Betweenness centrality (BC) were considered to find critical DEGs. The top 10% of the main connected component network based on degree value and BC were considered as hub and bottleneck nodes, respectively. The nodes that were both hub and bottleneck were determined as hub-bottlenecks. Gene ontology enrichment for the hubs, bottlenecks, and hubbottlenecks was performed via the CluGO plugin of Cytoscape software. The significant biological terms based on p-value ≤ 0.05 were identified. The terms were created in three groups, which were connected to hubs, bottlenecks, and hub-bottleneck.

Results
Gene expression profiles of 20 CD patients and 20 controls were matched by boxplot analysis (Figure 1).

Figure 1. Distribution of gene expression changes in CD patients and controls. Blue-colored bars: CD patients | Pink-colored bars: Controls
Since the middle line of samples was matched, the studied profiles were comparable. The top 250 DEGs, based on p-value, were identified by the GEO2R analyzer. Among them, 81 DEGs were characterized as the significant DEGs by FC>2 and p-value ≤ 0.05. Sixty-nine (85%) of the 81 significant DEGs were recognized by the STRING database and were candidate to construct the PPI network. The created network was not considered as scale-free because of the weak interactions between the included DEGs. As most of the queried DEGs were isolated, 20 first neighbors were added to the queried DEGs to form a scale-free network ( Figure 2).    These seven genes have been considered crucial genes, and since GATA3 is common in hubs and bottlenecks, it is termed hub-bottleneck. Three clusters of biological processes related to the seven crucial genes were determined (figure 3). For better understanding, the biological processes were screened to find the role of crucial genes (Table 3).

Discussion
The proteomic studies have provided valuable information about the molecular mechanism of  (18). Results of proteomic or genomic studies contain a large amount of differentially expressed genes/ proteins (DEGs/ DEPs) between patients and healthy controls. As a bioinformatics approach, PPI network analysis is closely tied to proteomics and genomics investigations (19,20). In this study, PPI network analysis is applied to screen the known proteins or genes related to Crohn's disease. Among the 81 queried DEGs related to Crohn's disease, 7 genes (IL-7, GATA3, CD28, CD5, TXN, ETS-1, and CCR7) were introduced as dysregulated genes. Dysregulation of cytokines and growth factors play a major role in the malfunction of the immune system in inflammatory bowel disease (21). IL-7 signals are involved in the development of chronic intestinal inflammation and human intestinal diseases (22). As depicted in figure 3 and table 3, IL-7 is related to the T cell differentiation in the thymus, which is the dominant class of biological terms and contains 16 biological terms (about 89% of all known terms). According to the results of studies, the lymphocyte population is reduced in the peripheral blood of CD patients, which is due to a decrease in IL-7 levels (23). Kader et al. (24) reported significantly higher levels of IL-7 in CD patients in remission compared to patients with active form.
GATA3 is the only hub-bottleneck node that is characterized by degree value=28 and BC=0.038. Hubbottleneck nodes are known as the key element of the PPI network. There are several studies on the role of the hub-bottleneck nodes in the incidence and development of diseases (16). Like IL-7, GATA3 is involved in T cell differentiation in the thymus. Moreover, GATA3 plus ETS-1 are the two dysregulated genes associated with positive regulation of pri-miRNA transcription by RNA polymerase II. GATA3 is a transcription factor with the ability to control Foxp3 (Forkhead box P3) expression and regulatory T (Treg) cells function. GATA3 is an essential factor for immune tolerance, and its removal is associated with autoimmune diseases (25). Li et al. (26) reported decreased Gata3+ Foxp3+ cell population in severe CD patients compared to controls. Research has shown that TXN is involved in various biological pathways and processes (27). TXN activities include the protection of proteins against aggregation and oxidative stress, helping cells to encounter environmental stresses. They also have a direct role in apoptosis, inflammatory processes and cell growth (28)(29)(30)(31). Nitric oxide pathway is a well-recognized biological term in Crohn's disease (32). In our investigation, the role of TXN in response to nitric oxide is highlighted. Oxidative stress has an important role in IBD pathogenesis and reduced and oxidized thioredoxin have a key role in intestinal redox biology Table 3. The biological terms and pathways concerning hubs, bottlenecks, and hub-bottleneck genes including IL-7, GATA3,  TXN, ETS-1, CCR7, CD28, and CD5.  R  Goterm  gene name  1  response to nitric oxide  CCR7, TXN  2  positive regulation of pri-miRNA transcription by RNA polymerase II  ETS-1, GATA3  3  T cell differentiation in thymus  CCR7, CD28, GATA3, IL-7  4 inflammatory response to antigenic stimulus CCR7, CD28, GATA3 5 interleukin-4 production CD28, GATA3 6 acute inflammatory response to antigenic stimulus CCR7, GATA3 7 lymphocyte costimulation CCR7, CD28, CD5 8 regulation of inflammatory response to antigenic stimulus CCR7, CD28 9 regulation of interleukin-4 production CD28, GATA3 10 hypersensitivity CCR7, GATA3 11 positive regulation of inflammatory response to antigenic stimulus CCR7, CD28 12 positive regulation of interleukin-4 production CD28, GATA3 13 T As it is discussed, intestinal inflammation, immune tolerance, intestinal redox biology balance, CD activity index, autoimmunity control, and consequently, the development of CD related presentations are the main processes associated with the introduced seven critical genes (IL-7, GATA3, CD28, CD5, TXN, ETS-1, and CCR7). Evaluation of these critical genes can lead to finding the key element as a drug target or diagnostic biomarker for Crohn's disease. In this study, GATA3 was highlighted as the most important CD-related gene with a dramatic difference in expression appearing as a single hub-bottleneck of the network. Through network analysis, we can introduce IL-7, GATA3, TXN, ETS-1, CCR7, CD28, and CD5 as Crohn's disease-related critical genes and possible biomarker panel. GATA3 was highlighted as the key element among them, and it was concluded that GATA3 might be a possible drug target or diagnostic biomarker for the disease. Notably, these differentially expressed genes can be investigated in greater detail in further studies with larger number of CD patients.